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1 Random walks 

Let G = {V, E) be an undirected graph. Consider the random process that starts from some vertex v G V{G), 
and repeatedly moves to a neighbor of the current vertex chosen uniformly at random. 

For t>0, and for u £ V{G), let pt{u) denote the probabiHty that you are at vertex u at time t. We can 
think of as a vector in R". Clearly, 

uev(G) 

Observe that you are at a vertex u at time t, then at time t + 1 you are at each neighbor v oi u with 
probability l/d{u), where d{u) denotes the degree of u. So, 

Pt+i{v) = Pr[at u at time t] ■ Pr['go to v at time t+ V given 'at u at time t'] 

{u,v)eE{G) 

= E 

(u,v)eE{G) ^ ^ 

We can write this using matrix notation as follows. Define the matrix W = Wq- 



\ otherwise 



Note that is the probability of going from j to i. We have 

Wg = A-D-'^, 

where A is the adjacency matrix of G, and D is the diagonal matrix with [D]i i the degree of the i-th vertex 
of G. 

2 Stationary distribution 

We define a probability vector tt which corresponds to the stationary distribution of the random walk. Let 

d{u) 



E„ev(G) d{v) 



Clcdm 1 TT is a probability distribution. 
Proof We have 



We next show that, if the random walk follows the distribution tt at time t, then it has the same 
distribution at time t+ 1. This is expressed using matrix notation in the following claim. 
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Claim 2 W ■Tr = 'K. 



Proof Let k e V{G). We have 

"ill 



This statement is equivalent to the matrix W having eigenvalue 1, with corresponding eigenvector tt 
(note that, since tt is a multiple of the vector of node degrees, D • 1, we could also take the latter as the 

eigenvector) . 

The natural next step at this point would be to claim that the random walk of a graph G always converges 
to the stationary distribution tt. This however turns out to be false. It is easy to see that for a bipartite 
graph G. Consider for example the case G = Cq, the cycle on 6 vertices, and let the vertex set of G be 
V{G) = {1, 2, . . . , 6}. Assume without loss of generality that the random walk starts at time = 1 at vertex 
6. Then, at time t, the current vertex is odd if and only if t is odd. Therefore, the walk does not converge 
to any distribution. 

3 Lazy Random Walks 

There is an easy way to fix the above periodicity problem. We introduce a modified version of the original 
walk, which we call lazy random walk. In a lazy random walk at time t: 

• we take a step of the original random walk with probability 1/2, 

• we stay at the current vertex with probability 1/2. 

We can show that the above modification breaks the periodicity of the random walk. The transition proba- 
bilities are encoded in the following matrix: 

W' = {W + I)/2= {I + A- D-^)/2, 

where / denotes the identity matrix. 

The fact that W and W are not symmetric matrices makes their analysis complicated. We will thus 
define new matrices. The normalized walk matrix is defined as 

N = £1-1/2 . w ■ D^l^ = . A ■ D-^l^. 

The normalized lazy walk matrix is defined as 

N' = D-^l'' ■ W ■ £>V2 = (7 + £,-1/2 . ^ . £,-V2)/2. 

Claim 3 The matrices N and W have the same eigenvalues and related eigenvectors. 
Proof Suppose that v is an eigenvector of N, with eigenvalue A. Let q = D^l"^ ■ v. Then, 

N ■V = X-V = . ^r . £,1/2 . ^ ^ jj-1/2 .y^ .q. 

Multiplying by D^^^ on the left we obtain 

W-q = \- D^'^ ■v = X-q. 
Therefore, q is an eigenvector of W with eigenvalue A. ■ 

Observe that, by Claim 2. W has eigenvector D ■ 1, with eigenvalue 1. Therefore, by Claim 3, the 
normalized walk matrix N has eigenvector D^^^ ■ 1, with eigenvalue 1. 
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4 Connections to Laplacians 

We've used the Laplacian L. The normaUzed Laplacian C is defined as 

jC = D-^/^ ■ L ■ D-y\ 

Claim 4 N = I-C. 

Therefore, the eigenvalues of N are given by 1 — (eigenvalues of C). So, it makes sense to order them in the 
opposite way 

1 = Ml > M2 > • • • > 

We can now translate our theorems about the eigenvalues of Laplacians to theorems about /UjS. We have 

• For each i, /ij e [—1, 1]. 

• If G is connected, then < 1- 

• The —1 eigenvalues occur only for bipartite graphs. 
Let (i[ bo the eigenvalues of A*"'. Then 

• For each i, /x^ G [0, 1]. 

• If G is connected, then fjb'2<l. 

5 ^2 Convergence 

Define the spectral gap to be 

For probability distributions p, q, we define their £2 distance to be 



The following theorem gives a bound on the rate of convergence of the lazy random walk to the stationary 

distribution tt. 

Theorem 5 Let po be an arbitrary initial distribution, and pt be the distribution after t steps of the lazy 
random walk. Then, 

I maxa; d{x) 



||pt-7r||2<(l-A)* 



miuj^ d{y) ' 



Proof [Proof for regular graphs] Observe that for a matrix M = Q ^ ■ A ■ Q, we have M'^ = Q ^ ■ A'' ■ Q. 

Thus, for an eigenvector v of M, ■ v = \^ ■ v. 

Recall that N' = {I + £)-i/2 . A ■ D-^/'^)/2. Since G is regular, D = d • /, for some integer d > 0. Thus, 

N' = I+-A 
d 

and the stationary distribution is simply the uniform distribution on V{G) 

.= 1.1. 

n 
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Let Ci = vf po^ where Vi denotes the eigenvector corresponding to the ^-th eigenvalue. We have 

n n 

N"' ■ Po = ^Ci ■ fi^ ■ Vi = ci ■ vi + ^Ci ■ ii'^ 

i=l 

Since ci = vf po = 1/n, it follows that 

n 



i=2 



. E'^f-Mf <M2\ , 
\ 1=2 \ r- 



2 

2 



< M2'EK^Po)'^^2 = (1-A)'= 
1=1 



Using a similar argument, we can also show an analogous bound for ioo convergence. 
Theorem 6 For any vertex v G V{G), 



\ptiv)-7riv)\<il-XY 



d{v) 



miny d{y) 



6 Conductance 

Cheeger's inequality carries over too, by replacing the isoperimetric number by a new parameter, which we 
call conductance i>. 

Definition 7 (Conductance) For S C V{G), let 

e{S) 



min (Eves d{v),EveS '^(^)) " 
We define the conductance to be 

^(G) = min$(5). 
^ ' scv ^ ' 

Using the above definition, Cheeger's inequality now becomes: 

6(1) • $2(G) < 1 - M2 < 0(1) • ^(G). 

The parameter $(G) is related to the rate of convergence to the stationary distribution. In particular, 
bounds on ^(G) let us prove that a walk mixes quickly. 

The intuitive interpretation of the connection between conductance and the rate of convergence is as 
follows. If a graph has high conductance, it is well-connected. Therefore, a large amount of probability mass 
can very quickly move from one part of the graph to another. 

7 Introduction to Monte Carlo methods 

Assume that we want to estimate tt = 3.1415 ... by throwing darts in the following dartboard: 
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Assume that the square corresponds to [— 1, 1] x [— 1. 1]. If you pick a point in the square uniformly at 
random, the probabihty that you pick one inside the circle is equal to 7r/4. Suppose that you pick n points 
in [—1,1] X [—1,1], uniformly at random. Then, 

E [number of points inside circle] — n ■ t: /A 

So, you can return the estimate 

TT ~ (number of points inside circle) • 4/n. 

A natural question is how close this estimate would be to the right answer. 

In order to answer the above question, we will introduce the Chernoff bound. Suppose we have a random 
variable r £ {0, 1}, such that Pr[r = 1] = p, and Pr[r — 0] — 1 — p. Assume that we draw n independent 
samples ri, . . . , r„, and let R = J2i "^i- -By the linearity of expectation, we have 

E[i?]-E[^r,:]=^E[r,]=n.p 

i i 

We will say that R e- approximates E[_R] if 

(1 - e)E[i?] < i? < (1 + e)E[i?] 

This is a multiplicative error measure. 

Theorem 8 (One version of the Chernoff bound) The probability that R fails to e-approximate E[i?] 
is 

Fr[\R-E[R] \ > eE[i?]] < 2e-"f^'/i2 ^ 26"^^^^'^^^'^. 
Some notes on the above bound: 

• The bound is near tight. 

• It is necessary for the trials to be independent, in order for the bound to hold. 

• It provides a multiplicative, but not an additive error guarantee. 

• For fixed e, it falls off exponentially in n. So, if we have failure probability 1/2, we can improve it to 
1/2*^ by performing m — n-k trails. 

• Therefore, smaller n requires more trials. 

• If we want e-approximation with probability 1 — (5, then we need 

»^e(!^). 

That is, we need enough trials to get 0(log(l/(5)/e^) successes. 
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Back to the dartboard example, if we want to estimate tt within, say, 5%, with probabihty at least 0.99, 
then wc have e — 0.05, S — 1/100. Therefore, we need 



> e 



( 



(7r/4)(0.05)2 



log(lOO) 



) 



Observe that it is easy to make S smaller, but it is harder to make e smaller. 

If we are bad darts, then we run into trouble. This happens if we have a big dartboard, and a small 
circle. 



In particular, if p is exponentially small, then we need exponentially many trials to expect a constant number 
of successes. 

We can also run into trouble if it is hard to throw darts at all. That is, if it is hard to draw samples 
uniformly at random from the ambient space. We will develop some techniques for fixing the above problems 
in certain scenarios. 
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